Thank you for taking the time to thoroughly review our work. To see our implementation for ACoE methods, please refer to the file steps.py, where the logic for A3B and A2B belief computations may be found in the acoe_step function.

In the file agent.py, the function acoe_advantage_and_return specifies how we compute our advantage-like term for acoe-to-go minimization.

Finally, in steps.py, the robust_q_ppo_step function shows how acoe is minimized using PPO.

Our code is a fork of the open-source project WocaR-RL, graciously provided by the folks at UMD at https://github.com/umd-huang-lab/WocaR-RL. 